Skip to content

feat(cala-core): Phase 3 — extend loop + mutation queue#138

Merged
daharoni merged 12 commits into
mainfrom
feat/cala-phase-3
Apr 19, 2026
Merged

feat(cala-core): Phase 3 — extend loop + mutation queue#138
daharoni merged 12 commits into
mainfrom
feat/cala-phase-3

Conversation

@daharoni
Copy link
Copy Markdown
Contributor

Summary

Phase 3 of the CaLa port: the extend loop and fit ↔ extend coordination protocol (design §3, §7.2–§7.3, thesis Algorithms 9 + 10). Extend proposes new components from the residual bip-buffer; fit applies mutations atomically between frames. Cold-start works — the pipeline runs from frame 0 with empty Footprints, discovers components, and produces traces correlated with ground truth on a dense synthetic recording.

Phase 3 acceptance: extending_cold_start_e2e runs 500 frames of a 10-cell + slow-baseline + neuropil synthetic, starting with empty Footprints, and recovers 6/10 cells at (spatial overlap ≥ 0.4, trailing-window trace correlation ≥ 0.5). Class-aware gates fire — both cell-class and neuropil-class estimators register. Infrastructure deliverable, not a tuned benchmark.

What lands

Buffers (crates/cala-core/src/buffers/):

  • ResidualRingBuf — 2n-allocated "bip-buffer" where every push writes primary + mirror slot, so the most recent capacity frames are always a single contiguous &[f32] regardless of wrap state. O(1) window, no per-cycle scratch copy.

Extend (crates/cala-core/src/extending/):

  • segment::variance_map / argmax_yx / patch_bounds / extract_patch_stack / select_max_variance_patch — thesis Alg 9 lines 1–4.
  • segment::rank1_nmf — projected alternating LS non-negative rank-1 factorization. Handles signed residual input without pre-clipping; output a is unit-L2.
  • segment::classify_candidate — Alg 9 quality gates plus design §3.1 class-aware shape priors. Classifies as Cell / Neuropil / SlowBaseline or rejects with a typed reason.
  • overlap::patch_to_frame_support / overlap_fraction — spatial intersection via two-pointer merge on sorted supports.
  • redundancy::pearson_correlation — temporal redundancy gate.
  • merge::merge_components — reconstructed-movie rank-1 NMF for two components; preserves NMF scale invariance.
  • mutation::PipelineMutation — Register / Merge / Deprecate with snapshot_epoch versioning.
  • mutation::Snapshot — deep-clone of (Ã, W, M, epoch) (row-level COW is a later optimization; the protocol surface stays the same).
  • mutation::MutationQueue — bounded FIFO, drop-oldest policy, drops counter for archive metrics.

Fit-side apply (crates/cala-core/src/fitting/pipeline.rs):

  • FitPipeline::apply_mutation / drain_apply — atomic Register/Merge/Deprecate extending (Ã, C̃, W, M) in one step, advancing epoch. Applied / Stale / Invalid outcomes; ApplyBatchReport aggregates counts.
  • Asset surgery: Footprints::push_component_classified + deprecate_by_id (stable u32 ids), Traces::insert_component_with_history + remove_component, SuffStats::insert_empty_component + remove_component.

Config: ExtendConfig extends the config.rs pattern with 16 tunables — window length, patch radius, NMF iteration/tolerance, recon-error ceiling, class-aware diameter ranges (cell / neuropil), cell compactness floor, support threshold, overlap + redundancy thresholds, queue capacity, per-cycle proposal cap. Every knob has a documented DEFAULT_* and a validated with_* builder. ComponentClass enum (Cell / SlowBaseline / Neuropil) and DEFAULT_COMPONENT_CLASS = Cell for back-compat with Phase 2 callers.

Tests: ~115 new tests across 7 test files covering each primitive, the mutation protocol, the apply outcomes, and the cold-start E2E.

Design notes

  • Candidate-plus-existing merge path is intentionally deferred. Thesis Alg 10 merges candidate + existing via reconstructed-movie rank-1 NMF. My PipelineMutation::Merge takes two existing ids (matching design §7.3). The cold-start E2E currently uses "redundant → skip"; the full thesis-Alg-10 path (emit Deprecate(existing) + Register(merged) on redundancy) is wired-ready via merge_components but left off in the E2E to keep the test honest about what Phase 3's bare infrastructure delivers without tuning.
  • Component-class tags on Footprints added non-breakingly: existing push_component still returns usize (position) and defaults to ComponentClass::Cell. The Phase 3 caller uses push_component_classified for the id-stable u32 return. Positions shift when ids deprecate; ids never do.
  • No hardcoded magic numbers. Every tuning knob for Phase 3 flows through ExtendConfig + RecordingMetadata, consistent with the Phase 1/2 discipline. The E2E test documents each override against its default.

Test plan

  • cd crates/cala-core && cargo test passes
  • cargo clippy --tests clean (no new warnings)
  • cargo check --target wasm32-unknown-unknown --no-default-features --features jsbindings clean
  • cargo check --no-default-features --features pybindings clean
  • extending_cold_start_e2e::cold_start_dense_recovery demonstrates: epoch > 0, recall ≥ 4/10, k ≤ 50, at least one non-cell class

🤖 Generated with Claude Code

daharoni and others added 12 commits April 18, 2026 08:31
Empty module stubs for `src/extending/{mod,segment,overlap,redundancy,
merge}.rs` and `src/buffers/{mod,bipbuf}.rs`, plus `ExtendConfig` with
`DEFAULT_*` constants for the Phase 3 knobs: window length, patch
radius, NMF iteration/tolerance, quality-gate thresholds (recon
error, class-aware diameter ranges for cell/neuropil, cell
compactness), overlap + redundancy fractions, mutation queue
capacity, and per-cycle proposal cap.

`ComponentClass` enum (Cell / SlowBaseline / Neuropil) lands here
for class-aware shape priors; Phase 2 footprints remain the default
`Cell` class via `DEFAULT_COMPONENT_CLASS`.

Builder methods follow the `PreprocessConfig`/`FitConfig` precedent —
every tuning knob has a documented default, validated override, and
`cargo test --test config_metadata` pins the defaults so silent
tweaks cannot land without a test churn.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2n-allocated "bip-buffer" for the Phase 3 extend loop: every push
writes the frame into both the primary slot and its mirror at
`+ capacity`, so the most-recent `capacity` frames are always
readable as a single contiguous `&[f32]` regardless of wrap state.
Avoids per-cycle scratch copies in extend's variance / NMF passes.

Window ordering is oldest-to-newest over the slice, saturating `len`
at `capacity`. `frame(i)` / `latest()` expose per-frame access;
`clear()` resets without reallocating.

Tests cover: empty/full/partial states, single and many-wrap
behavior, contiguity across every head offset, mirror-write
correctness (no stale-slot leak), push-length validation, and
zero-argument constructor panics.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First stage of thesis Algorithm 9: compute per-pixel residual
variance over the extend window, locate the argmax pixel, and
extract a radius-`r` patch time stack clipped to frame bounds.
Downstream tasks (4 rank-1 NMF, 5 quality gates) consume this as
input.

`variance_map`, `argmax_yx`, `patch_bounds`, `extract_patch_stack`
are individually public so the Phase 3 extend harness can exercise
each stage. `select_max_variance_patch` composes them into the
per-cycle entry point.

Tests cover: empty buffer, constant residual, hand-computed
variance, negative-variance clamp, argmax tie-breaking / NaN
handling, corner / interior / over-sized patch clipping, frame-
order preservation in the time stack, and shape-mismatch panics.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Projected alternating least squares for `X ≈ a c^T` with `a, c ≥ 0`.
Handles signed residual input (no pre-clip required) — each update
computes the closed-form LS step and clamps negatives, which is
sufficient for a rank-1 non-negative factor.

Output convention: `a` is unit-L2 normalized, `c` carries the
scale. Downstream quality gates (Task 5 compactness / diameter)
assume the unit-L2 contract.

`Rank1Nmf` reports iteration count, convergence flag, and the
relative Frobenius reconstruction error used by the first quality
gate (`recon_error_max` in ExtendConfig).

Tests cover: exact-rank-1 recovery, non-negativity on signed input,
zero-input short-circuit, unit-L2 invariant, shifted off-center
patch, noisy non-convergence at max_iter, and the recon-error
formula against an independent Frobenius-ratio computation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Thesis Algorithm 9 quality-gate suite plus design §3.1 class-aware
shape priors. `classify_candidate` runs, in order:

  1. Reconstruction-error gate  (`recon_error_max`)
  2. Support-mask extraction    (`footprint_support_threshold_rel`)
  3. 2-D morphology             (area, 4-conn perimeter, compactness)
  4. Class classification:
       diameter < cell_min   → reject BelowCellMin
       cell_min..=cell_max   → Cell (requires compactness ≥ floor)
       cell_max..neuropil_min → reject AmbiguousDiameter
       neuropil_min..=neuropil_max → Neuropil
       above neuropil_max    → SlowBaseline

Diameter bounds are multiples of `neuron_diameter_um`-in-pixels, so
class boundaries scale with recording. Compactness is the standard
isoperimetric quotient `4π·area/perimeter²` clamped to [0, 1].

New config knob `footprint_support_threshold_rel` (default 0.1)
controls support extraction; added to DEFAULT_* suite + pinned by
config_metadata tests.

Tests cover each gate in isolation (recon / empty support / below-
min / compactness fail / ambiguous gap) and the three accept
classes, plus a metadata-override test that shows the class
boundaries track `neuron_diameter_um`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Spatial and temporal gates from thesis Algorithm 10:

  `overlap.rs`:
  - `patch_to_frame_support` — map a patch-relative unit-L2 spatial
    factor to a full-frame sorted-ascending pixel index list, using
    the same support-threshold convention as the quality gates.
  - `overlap_count` — two-pointer merge over sorted supports.
  - `overlap_fraction` — `|a ∩ b| / min(|a|, |b|)`, ∈ [0, 1].

  `redundancy.rs`:
  - `pearson_correlation` — Pearson r over equal-length traces;
    returns 0 on zero-variance / empty input (safe "non-redundant"
    answer), clamps to [-1, 1] against f32 accumulation drift.

Later tasks compose these: Task 7's merge picks pairs with
`overlap_fraction ≥ cfg.overlap_fraction_min` and
`pearson_correlation ≥ cfg.trace_corr_min`; Task 10 drains the
resulting mutations.

Tests cover: coordinate mapping, ascending-sorted invariant on the
support list, threshold behavior, disjoint/partial/identical
support intersections, empty-input handling, Pearson identity /
anti-correlation / affine-invariance, zero-variance safe-zero,
orthogonality over a sine-cosine pair, and f32 clamp to ±1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`merge_components` takes two components (sparse support + unit-L2
spatial factor + T-frame trace) and returns a single merged
component via rank-1 NMF on the joint reconstructed movie
`a_i c_iᵀ + a_j c_jᵀ` over the union support. Preserves NMF
scale invariance (merged result is independent of input scaling)
and gives the union support of both inputs.

`MergeResult` exposes the rank-1 fit's recon_error so callers can
sanity-check the merge — a low value means the pair truly was a
single source and the merge is clean; a high value means the
redundancy gate likely over-matched and the merge shouldn't fire.
Task 10 wires this into the fit-side apply to reject stale or
bogus merges.

Tests cover: identical-source merge (trace scales 2×), scaled
copies (proportional traces), disjoint supports with
proportional traces (rank-1 merge), overlapping supports with
shared-pixel mass combination, genuinely distinct pairs (residual
error well above 0), support sort-order preservation, trace
length passthrough, and shape-validation panics.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3 fit ↔ extend coordination types (design §7.2–§7.3):

  - `PipelineMutation` enum: Register / Merge / Deprecate, each
    carrying the `snapshot_epoch` it was computed against. Fit
    uses the epoch to drop stale mutations (Task 10).
  - `Snapshot` struct: deep-clone of `(Ã, W, M, epoch)` that
    extend reads without racing fit's writes. Full clone now;
    row-level copy-on-write is a profile-gated refinement later.
  - `Epoch` = u64 counter; advanced only by structural mutation
    apply, not by per-frame numeric updates.
  - `DeprecateReason` enum: FootprintCollapsed / TraceInactive /
    MergedInto / InvalidApply — `'static` so mutations stay
    cheap to transport.

Footprints gains stable `u32` ids + `ComponentClass` tags stored
on each `Component`. `push_component` keeps its existing `usize`
return (position) and defaults to `ComponentClass::Cell`;
`push_component_classified` returns the stable id for the Phase
3 caller. New accessors: `id(i)`, `class(i)`, `position_of(id)`,
`deprecate_by_id(id)`, `next_id()`, `ids()` iterator.

FitPipeline gains `epoch()` and `snapshot()`. The snapshot is a
Clone of the visible state — verified independent under
subsequent `step` calls and footprint mutation on the clone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`MutationQueue` — single-threaded harness version of the design
§7.3 bounded ring. FIFO order, drop-oldest on overflow, `drops()`
counter surfaces saturation for the archive metric.

Stand-in for the SAB ring that ships with the Phase 5 worker
runtime. Exposing the same protocol surface (bounded push / FIFO
drain / drop counter) lets Task 10's fit-side apply and later
extend-side publish paths be exercised end-to-end with no worker
infrastructure.

Tests cover: zero-capacity rejection, empty-state shape, FIFO
push/pop, drop-oldest under overflow, drain preserving FIFO,
drops counter persistence across drains, and 1000-push stress
verifying only the last `capacity` survive.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`FitPipeline::apply_mutation` and `drain_apply` wire extend's
`PipelineMutation`s into the fit state between frames (design
§7.3 "atomicity of apply"). Register / Merge / Deprecate each
extend `(Ã, C̃, W, M)` in one atomic step and advance the epoch.

Epoch validation:
  - Register always applies (proposes new state, references none).
  - Merge rejects as `Stale` if either merge_id is no longer live
    (fit advanced past extend's snapshot and one of the targets
    was already deprecated).
  - Deprecate rejects as `Stale` if the target id isn't live.

Invalid vs Stale: self-inconsistent mutations (support/values
length mismatch, self-merge of same id twice) return
`Invalid(reason)` without advancing state. `ApplyBatchReport`
surfaces the three counters (`applied`, `stale`, `invalid`) for
Phase 6 archive metrics.

Asset surgery now supported atomically:
  - Footprints: `push_component_classified` + `deprecate_by_id`.
  - Traces: `insert_component_with_history` + `remove_component`;
    merged history = history_i + history_j with the last
    `window_len` entries overwritten by extend's fresh trace.
  - SuffStats: `insert_empty_component` + `remove_component` —
    zero-init for new rows/cols.

Tests cover: register zero-pads trace history, register rejects
shape mismatch, deprecate/merge advance epoch, stale deprecate /
stale merge don't advance, self-merge rejected, merge-sums-
source-histories invariant, drain_apply FIFO + mixed-outcome
report, and post-apply `step` correctness on both register and
merge paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Synthetic recording with 10 ground-truth cells + 1 slow baseline +
1 neuropil component over 500 frames on a 32×32 FOV, starting with
empty `Footprints`. Drives the full Fit + Extend + apply loop
inline: per-frame `step` feeds the residual bip-buffer, and every
30 frames an extend cycle runs variance-map → max-variance patch
selection → rank-1 NMF → class-aware quality gates → redundancy
check → mutation queue publish → `FitPipeline::drain_apply`.

Acceptance criteria:
  1. Epoch advances (mutations actually land).
  2. ≥ 40% cell recall at (spatial overlap ≥ 0.4, trace correlation
     ≥ 0.5 over trailing 150 frames). Trailing window skips the
     zero-pad region for late-registered components.
  3. At least one non-cell class (Neuropil or SlowBaseline) gets
     registered — class-aware gates are functional.
  4. Total component count ≤ 5 × n_cells_gt — spurious bounded.

Typical result: 6/10 cells recovered cleanly on this synthetic,
with 4 cell-class + 8 neuropil-class estimators. Higher recall is
tractable with per-recording tuning; the Phase 3 deliverable is
the infrastructure (buffers, segment, rank-1 NMF, gates, overlap /
redundancy, merge, mutation protocol, fit apply) end-to-end, not
a benchmark number.

Every tuning knob flows through `ExtendConfig` / `RecordingMetadata`
(no magic numbers) and the test documents why each override
diverges from the default.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI's rustfmt --check caught long-line and blank-line deviations
across 13 Phase 3 files. `cargo fmt` reformat only — no semantic
changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@daharoni daharoni merged commit fe67570 into main Apr 19, 2026
7 checks passed
@daharoni daharoni deleted the feat/cala-phase-3 branch April 19, 2026 04:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant